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Abstract 

This paper studies convergence properties of optimal values and actions for discounted and average- 
cost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies 
these properties to the stochastic periodic-review inventory control problem with backorders, positive 
setup costs, and convex holding/backordering costs. The following results are established for MDPs 
with possibly noncompact action sets and unbounded cost functions: (i) convergence of value iterations 
to optimal values for discounted problems with possibly non-zero terminal costs, (ii) convergence of 
optimal finite-horizon actions to optimal infinite-horizon actions for total discounted costs, as the time 
horizon tends to infinity, and (iii) convergence of optimal discount-cost actions to optimal average-cost 
actions for infinite-horizon problems, as the discount factor tends to 1. 

Being applied to the setup-cost inventory control problem, the general results on MDPs imply the 
optimality of (s, S) policies and convergence properties of optimal thresholds. In particular this paper 
analyzes the setup-cost inventory control problem without two assumptions often used in the literature: 

(a) the demand is either discrete or continuous or (b) the backordering cost is higher than the cost of 
backordered inventory if the amount of backordered inventory is large. 
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1 Introduction 


Since Scarf [28] proved the optimality of (s, S ) policies for finite-horizon problems with continuous de¬ 
mand, there have been significant efforts to extend this result to other models. Arthur F. Veinott [34, 35] was 
one of the pioneers in this exploration, and he combined a deep understanding of Markov decision processes 
with a passion for the study of inventory control. It is a great pleasure to dedicate this paper to him. 

This paper introduces new results on Markov Decision Processes (MDPs) with infinite state spaces, 
weakly continuous transition probabilities, one-step costs that can be unbounded, and possibly noncompact 
action sets under the discounted and average-cost criteria. The results on MDPs are applied to the stochastic 
periodic-review setup-cost inventory control problem. We show that this problem satisfies general conditions 
sufficient for the existence of optimal policies, the validity of the optimality equations, and the convergence 
of value iterations. In particular, these results are used to show the optimality of (s, S ) policies for finite- 
horizon problems, and for infinite-horizon problems with the discounted and long-term average-cost criteria. 

Since the 1950s, inventory control has been one of the major motivations for studying MDPs. Flowever, 
until recently there has been a gap between the available results in the MDP theory and the results needed to 
analyze inventory control problems. Even now most work on inventory control assumes that the demand is 
either discrete or continuous. Moreover, the proofs are often problem-specific and do not use general results 
on MDPs, which often provide additional insight. For example. Theorem 6.10 below states convergence 
properties of optimal thresholds in addition to the existence of optimal (s, S) policies, and the proof of this 
theorem is based on Theorem 3.6 and Corollary 4.4 established for MDPs. 

With such a long history, the inventory control literature is far too expansive to attempt a complete liter¬ 
ature review. The reader is pointed to the books by Bensoussan [1], Beyer et al. [4], Heyman and Sobel [22], 
Porteus [26], Simchi-Levi et al. [32], and Zipkin [39]. Applications of MDPs to inventory control are also 
discussed in Bertsekas [2], In the case of inventory control, under the average cost criterion the optimal¬ 
ity of (s, S) policies was established by Iglehart [24] and Veinott and Wagner [36] in the continuous and 
discrete demand cases, respectively. As explained in Beyer and Sethi [5, p. 526] in detail, the analysis in 
Iglehart [24] assumes the existence of a demand density. The proofs for discrete demand distributions were 
significantly simplified by Zheng [38]. Zabel [37] corrected Scarf’s [28] results on finite-horizon inventory 
control. Beyer and Sethi [5] described and corrected gaps in the proofs in [24, 36] on infinite-horizon inven¬ 
tory control with long-term average costs. Almost all studies of infinite-horizon inventory control deal with 
either discrete or continuous demand. In some cases, the choice between the use of discrete and continuous 
distributions depends on a particular application. There is an important practical reason why many studies 
use discrete demand. In operations management practice, the overwhelming majority of information sys¬ 
tems record integer quantities of demand and stock level. Without assumptions that the demand is discrete 
or continuous, the optimality of (s, S ) policies for average cost inventory control problems follows from 
Chen and Simchi-Levi [9], where under some technical assumptions coordinated price-inventory control is 
studied and methods specific to inventory control are used. Huh et al. [23] developed additional problem- 
specific methods for inventory control problems with arbitrary distributed demands. Under some additional 
assumptions, including the assumption that holding costs are bounded above by polynomial functions, the 
optimality of ( s , S ) policies also takes place when the demand evolves according to a Markov chain; see 
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Beyer et al. [4] and the references therein. 

Early studies of MDPs dealt with finite-state problems and infinite-state problems with bounded costs. 
The case of average costs per unit time is more difficult than the case of total discounted costs. Sennott [31] 
developed the theory for the average-cost criterion for countable-state problems with unbounded costs. Schal 
[29, 30] developed the theory for uncountable state problems with discounted and average-cost criteria when 
action sets are compact. In particular, Schal [29, 30] identified two groups of assumptions on transition 
probabilities: weak continuity and setwise continuity. As explained in Feinberg and Lewis [17, Section 4], 
models with weakly continuous transition probabilities are more natural for inventory control than models 
with setwise continuous transition probabilities. Hernandez-Lerma and Lasserre [21] developed the theory 
for problems with setwise continuous transition probabilities, unbounded costs, and possibly noncompact 
action sets. Luque-Vasques and Hernandez-Lerma [25] identified an additional technical difficulty in dealing 
with problems with weakly continuous transition probabilities even for finite-horizon problems by demon¬ 
strating that Berge’s theorem, that ensures semi-continuity of the value function, does not hold for problems 
with noncompact action sets. Feinberg and Lewis [17] investigated total discounted costs for inf-compact 
cost functions and obtained sufficient optimality conditions for average costs. Compared to Schal [30] these 
results required an additional local boundedness assumption that holds for inventory control problems, but 
its verification is not easy. Feinberg et al. [14, 15] introduced a natural class of IK-inf-compact cost func¬ 
tions, extended Berge’s theorem to noncompact action sets, and developed the theory of MDPs with weakly 
continuous transition probabilities, unbounded costs, and with the criteria of total discounted costs and long¬ 
term average costs. In particular, the results from [14] do not require the validity of the local boundedness 
assumption. This simplifies their applications to inventory control problems. Such applications are consid¬ 
ered in Section 6 below. The tutorial by Feinberg [11] describes in detail the applicability of recent results 
on MDPs to inventory control. 

Section 2 of this paper describes an MDP with an infinite state space, weakly continuous transition 
probabilities, possibly unbounded one-step costs, and possibly noncompact action sets. Sections 3 and 
4 provide the results for discounted and average cost criteria. In particular, new results are provided on 
the following topics: (i) convergence of value iterations for discounted problems with possibly non-zero 
terminal values (Corollary 3.5), (ii) convergence of optimal finite-horizon actions to optimal infinite-horizon 
actions for total discounted costs, as the time horizon tends to infinity (Theorem 3.6), and (iii) convergence of 
optimal discount-cost actions to optimal average-cost actions for infinite-horizon problems, as the discount 
factor tends to 1 (Theorems 4.3 and 4.5). Studying the convergence of value iterations and optimal actions 
for discounted costs with non-zero terminal values in this paper is motivated by inventory control. As was 
understood by Veinott and Wagner [36], without additional assumptions (s, S ) policies may not be optimal 
for problems with discounted costs, but they are optimal for large values of discount factors. Even for large 
discount factors, (s, S ) policies may not be optimal for finite-horizon problems with discounted cost criteria 
and zero terminal costs. However, (s, S ) policies are optimal for such problems with the appropriately 
chosen nonzero terminal costs, and this observation is useful for proving the optimality of (s, S ) policies for 
infinite-horizon problems. 

Section 5 relates MDPs to problems whose dynamics are defined by stochastic equations, as this takes 
place for inventory control. Section 6 studies the inventory control problem with backorders, setup costs, 
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linear ordering costs, and convex holding costs and provides two results on the existence of discounted 
and average-cost optimal (s,S) policies. The first result, Theorem 6.10, states the existence of optimal 
( s , S ) policies for large discount factors and average costs. It does not use any additional assumptions, and 
the proof is based on adding terminal costs to finite-horizon problems. The second result, Theorem 6.12, 
states the existence of optimal (s, S) policies for all discount factors under an additional assumption that 
it is expensive to keep a large backordered amount of inventory. Such assumptions are often used in the 
literature; see Bertsekas [2], Beyer et al. [4], Chen and Simchi-Levi [8, 9], Huh et al. [23], and Veinott and 
Wagner [36]. Theorems 6.10 and 6.12 also describe the convergence properties of optimal thresholds for 
the following two cases: (i) the horizon length tends to infinity, and (ii) the discount factor tends to 1. 

In the conclusion of the introduction, we would like to mention that the results on MDPs with weakly 
continuous transition probabilities, non-compact action sets and unbounded costs presented in this paper are 
broadly applicable to a wide range of engineering and managerial problems. Potential applications include 
resource allocation problems, control of workload in queues, and a large variety of inventory control prob¬ 
lems. In particular, the presented results should be applicable to combined pricing-inventory control and to 
supply chain management; see [8, 9, 32]. Moreover, as mentioned above, the results for MDPs presented 
below significantly simplify the analysis of the stochastic cash balance problem investigated in [17] because 
the current results do not require verifying the local boundedness assumption introduced in [17]. Instead 
Theorem 4.1 below can be employed. The periodic-review setup-cost inventory control problem was se¬ 
lected as an application in this paper mainly because it is probably the most highly studied inventory control 
model. We provide new results for this classic problem. 

2 Definition of MDPs with Borel State and Action Sets 

Consider a discrete-time Markov decision process with the state space X, action space A, one-step costs c, 
and transition probabilities q. The state space X and action space A are both assumed to be Borel subsets 
of Polish (complete separable metric) spaces. If an action a € A is selected at a state x € X, then a cost 
c(x. a) is incurred, where c:XxA—»M = MU {+oo}, and the system moves to the next state according 
to the probability distribution q(-\x, a) on X. The function c is assumed to be bounded below and Borel 
measurable, and q is a transition probability, that is, q(B\x. a ) is a Borel function on X x A for each Borel 
subset B of X, and (/(-|cc, a) is a probability measure on the Borel er-field of X or each (x, a) £ X x A. 

The decision process proceeds as follows: at time t = 0,1,... the current state of the system, xt, is 
observed. A decision-maker decides which action, a, to choose, the cost c(x, a ) is accrued, the system 
moves to the next state according to q(■ \ x, a), and the process continues. Let Ht = (X x A) 4 x X be 
the set of histories for t = 0,1,... .A (randomized) decision rule at epoch t = 0,1,... is a regular 
transition probability ir t from H t to A. In other words, (i) tt/ (■ | ht ) is a probability distribution on A, where 
ht = (xq, a(). :x'i ; ..., at- 1 , Xt) and (ii) For any measurable subset B C A, the Function tt/ (/I|-) is measurable 
on Hi. A policy n is a sequence (ttq,tti ,...) of decision rules. Moreover, 7 r is called non-randomized if 
each probability measure TTt(-\ht) is concentrated at one point. A non-randomized policy is called Markov if 
all decisions depend only on the current state and time. A Markov policy is called stationary if all decisions 
depend only on the current state. Thus, a Markov policy cj) is defined by a sequence 0o, <i>i.... of measurable 
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mappings 4> t : X —>• A. A stationary policy 6 is defined by a measurable mapping (f> : X — > A. 

The Ionescu Tulcea theorem (see [3, p. 140-141] or [21, p. 178]) implies that an initial state x and a 
policy 7r define a unique probability distribution PJ on the set of all trajectories H 0 0 = (X x A)°° endowed 
with the product rx-ficld dehned by Borel rx-fields of X and A. Let E} be the expectation with respect to this 
distribution. For a finite horizon N = 0,1,... and a bounded below measurable function F : X — > R called 
the terminal value, define the expected total discounted costs 


v N,F,ot ( x ) ® 


7T 
X 


~N—1 

Y ofc(x t , at) 
_t =o 


a N F(x n ) , 


(2.1) 


where a £ [0,1), Vq f a (x) = F(x), x £ X. When F(x) = 0 for all x £ X, we shall write v N,a( x ) instead 
of vjr F a (x). When N = oo and F(.x) = 0 for all x £ X, (2.1) defines the infinite horizon expected total 
discounted cost of 7r denoted by <( x ) instead of a (x). The average costs per unit time are defined as 


l K - 1 

w 7T (x) := lirnsup — E£ > c(xt,at). 
n ^°° N 


For each function V" K (x) = vJj F a (x), vjj a (x), v*(x), or w n (x), dehne the optimal cost 


( 2 . 2 ) 


V(x) := inf V n (x), (2.3) 

7ren 

where II is the set of all policies. A policy 7r is called optimal for the respective criterion if V' K {x) = V(x) 
for all x £ X. 

We remark that the definition of an MDP usually includes the sets of available actions A(x) C A, x £ X. 
We do not do this explicitly because we allow c(x,a) to be equal to +oo. In other words, a feasible pair 
(x, a ) is modeled as a pair with finite costs. To transform this model to one with feasible action sets, it is 
sufficient to consider the sets of available actions A{x) such that A(x) 2 A c (x), where A c (x) = {a £ A : 
c(x,a ) < Too}, x £ X. In particular, it is possible to set A[x) := A c (x), x £ X. In order to transform 
an MDP with action sets A(x) to a MDP with action sets A, x £ X, it is sufficient to set c(x. a ) = Too 
when a £ A \ A(x). Of course, certain measurability conditions should hold, but this is not an issue when 
the function c is measurable. We remark that early works on MDPs by Blackwell [7] and Strauch [33] 
considered models with A(x) = A for all x £ X. This approach caused some problems with the generality 
of the results because the boundedness of the cost function c was assumed and therefore c(x, a) £ M for all 
(x, a). If the cost function is allowed to take infinitely large values, models with A(x) = A are as general as 
models with A(x) C A, x £ X. 

3 Optimality Results for Discounted Cost MDPs with Borel State and Ac¬ 
tion Sets 

It is well-known (see e.g. [3, Proposition 8.2]) that vt t F,a(x) satisfies the following optimality equation, 

vt+i,F,a(x) = inf {c{x, a) + a v t} F,a{y)q(dy\x, a)}, x £X, t = 0,1,... . (3.1) 

agA / 
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In addition, a Markov policy f>, defined at the first N + 1 steps by the mappings fo,..., </>jv satisfying the 
following equations for all x € X and all t = {),.... N. 

Vt+l,F,a(x) = c{x,<f>N-t{x)) + OL J V t ,F,a(y)q(dy\x, (j>N-t,a (x)), X € X, (3.2) 

is optimal for the horizon N + 1; see e.g. [3, Lemma 8.7]. 

It is also well-known (see e.g. [3, Proposition 9.8]) that v a (x) satisfies the following discounted cost 
optimality equation, 

v a (x) = inf{c(x,a)+a / v a (y)q(dy\x,a)}, i£X. (3.3) 

a£ A J 

According to [3, Proposition 9.12], a stationary policy <j>° is optimal if and only if 

V a (x) = c(x,f a (x)) + aj V a (y)q(dy\x,(j) a (x)), x£X. (3.4) 

However, additional conditions on cost functions and transition probabilities are needed to ensure the 
existence of optimal policies. Earlier conditions required compactness of action sets. They were introduced 
by Schal [29] and consisted of two sets of conditions that required either weak or setwise continuity as¬ 
sumptions. For setwise continuous transition probabilities, Hernandez-Lerma and Lasserre [21] extended 
these conditions to MDPs with general action sets and cost functions c(x, a) that are inf-compact in the 
action variable a. Feinberg and Lewis [17] obtained results for weakly continuous transition probabilities 
and inf-compact cost functions. Feinberg et al. [14] generalized and unified the results by Schal [29] and 
Feinberg and Lewis [17] for weakly continuous transition probabilities to more general cost functions by 
using the notion of a K- inf-compact function. IK-inf- com pact functions were originally introduced in [14, 
Assumption W*] without using the term IK- inf-compact, and formally introduced and studied in Feinberg et 
al. [13, 15]. As explained in Feinberg and Lewis [17, Section 4], weak continuity holds for periodic review 
inventory control problems. The setwise continuity assumption may not hold, but it holds for problems with 
continuous or discrete demand distributions. This paper focuses on the essentially more general case of 
weakly continuous transition probabilities. 

Let LJ be a metric space and U C U. Consider a function / : U —> R. For V C U define the level sets 

Vf{\- V) := {y € V : f{y) < A}, A € R. (3.5) 

A function / : U —> K is called lower semi-continuous at a point y G U if f(y) < liminf re _ >00 f(y <n> ) for 
every sequence {y <n> € U} n = 1 , 2 ,... converging to y. A function / : U R is called lower semi-continuous 
if it is lower semi-continuous at each y £ U. A function f : U —> M is called inf-compact if all the level 
sets T>f{ A; U ) are compact. Inf-compact functions are lower semi-continuous. For three sets U. V., and W, 
where V C U, and two functions g : V -» W and f : U —? W. the function g is called the restriction of / 
to V if g(x) = f(x) when x G V. 

Definition 3.1 (cf. Feinberg et al. [15, 13], Feinberg and Kasyanov [12]) Let be metric spaces and 
S (*) C be their nonempty Borel subsets, i = 1,2. A function f : x S^ —>• R is called K -inf- 

compact if for any nonempty compact subset K of ,S' ! 1 i. the restriction of f to K x S i2 > is inf-compact. 
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We are mainly interested in applying this definition to the function / = c, where c is the one-step cost. 
In this case, X and A are Borel subsets of the Polish spaces S- 1 -' and S- 2) mentioned in the definition of an 
MDP. Inventory control applications often deal with S' 1; = X and § ,2; = A. However, other applications 
are possible. For example, assumption (ii) of Theorem 5.3 deals with 11 = A and § ( - 2;i = X. 

The next proposition, which follows directly from Feinberg et al. [15, Lemma 2.1], demonstrates that K- 
inf-compact cost functions are natural generalizations of inf-compact cost functions considered in Feinberg 
and Lewis [17] and lower semi-continuous cost functions considered in the literature on MDPs with compact 
action sets, see e.g., Schal [29, 30]. 

Proposition 3.2 The following two statements hold: 

(i) an inf-compact function f : X x A —>• R is inf-compact; 

(ii) if A : X — > 2 A \{0} is a compact-valued upper semi-continuous set-valued mapping and / : X x A —> 
R is a lower semi-continuous function such that f(x, a) = +oo for x £ X and for a £ A \ A(x), then 
the function f is ¥—in f-compact, where 2 1 denotes the set of all subsets of a set U. 

Definition 3.3 The transition probability q is called weakly continuous, if 

f f(x)q(dx\x^ n \ a^) — > f f (x)q(dx\x^\ a®), as n —y oo, 

7x 7x 

for every bounded continuous function / : X —» R and for each sequence {(x^ n \a^),n 
X x A converging to (x^, a®) £ X x A. 

Assumption W*. The following conditions hold: 

(i) the cost function c is bounded below and K-inl'-compact; 

(ii) if (x(°\ a®) is a limit of a convergent sequence {(x^, a^), n = 1, 2,...} of elements of Xx A such 
that c(x^ n \ a( n )) < +oo for all n = 0,1,2,..., then the sequence {(?(• |(x( n \ a^)), n = 1,2,...} 
converges weakly to q(-\(x^°\a^)); that is, (3.6) holds for every bounded continuous function / on 
X. 

For example, Assumption W*(ii) holds if the transition probability q(-\x. a) is weakly continuous on 
X x A. The following theorem describes the structure of optimal policies, continuity properties of value 
functions, and convergence of value iteration. 

Theorem 3.4 (Feinberg et al. [14, Theorem 2]) Suppose Assumption W* holds. For t = 0,1,..., = 

0,1,..., and a £ [0,1), the following statements hold: 

(i) the functions {uj iQ , t > 0} andv a are lower semi-continuous on X, and vt, a (x) —y v a (x) ast —>• +oo 
for each x € X; 


(3.6) 

= 1, 2,...} on 
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(ii) the value functions {vt Q , t > 0} satisfy the optimality equations 

vt+i,a(x) = min c(x, a) + a v ta (y)q(dy\x, a) > , x € X, (3.7) 

aeA l Jx J 

and the nonempty sets A t a (x) := {a £ A : Vt+i,a(x) = c(x,a ) + a f x v tjCt (y)q(dy\x, a)}, x £ X, 
satisfy the following properties: 

(a) the graph Gr x(At <a ) = {(x, a) : x £ X, a G Av « fiore/ subset o/X x A; 

(b) the following hold: 

(bl) ifv t+ i : a(x) = +oo, t/ierc A tj a(x) = A; 

(Z>2) ifvt+i t a(x) < +oo, At <a (x) is compact; 

(iii) for each horizon (N + 1), there exists a Markov optimal policy (cf> o,..., (/>n)'i 

(iv) if for an (N + 1 )-horizon Markov policy (cj>o ,..., </>jv) the inclusions cj>N~t(x ) € A t}a (x), x € X, 
t = 0..... N hold, then this policy is (N + 1 )-horizon optimal; 

(v) the value function v a satisfies the optimality equation 



(3.8) 


(vi) the nonempty sets A a {x) := {a G A : v a (x) = c(x,a) + a f x v Q (y)q(dy\x, a)}, x € X, satisfy the 
following properties: 

(a) the graph Grx(Ao,) = {(x, a) : i£X,a£ A a (x)} Av a Sore/ subset o/X x A; 

(&) if v a (x ) = +oo, r/ien A a (x) = A a/?/ ifv a (x) < +oo, A/ien A a (x) is compact; 

(vii) for the infinite horizon there exists a stationary discount-optimal policy (j) n , and a stationary policy is 
optimal if and only if fi a (x) € A a [x) for all x € X; 

(viii) (Feinberg and Lewis [17, Proposition 3.1(iv)]) if the cost function c is inf-compact, then the functions 
Vt,a> t = 1,2,..., and v a are inf-compact on X. 

The following corollary extends the previous theorem to nonzero terminal values F. This extension is useful 
for the analysis of inventory control problems. 

Corollary 3.5 Let Assumption W* hold. Consider a bounded below, lower semi-continuous function F : 
X —> M. The following statements hold for t = 0, 1,2,..., N = 0,1,2,..., and a € [0,1) : 

(i) the functions Vt,F,a are bounded below and lower semi-continuous; 

(ii) the value functions ry+i .F.n satisfy the optimality equations 



i£X, 


(3.9) 


where no,F,a(^) = F(x) for all x £ X; 
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( iii) the nonempty sets 


A t ,F, a (x) := {a G A : v t+ i,F,a( x ) = c(x,a) + a v tjFt0l (y)q(dy\x,a)}, x G X, 

Jx 

satisfy the following properties: 

(a) the graph Grx(A f ,F jQ: ) = {(x, a) : x G X, a G j4 f) p jQ ,(x)} is a Bore/ subset o/X x A; 

(b) the following hold: 

(bl) ifv t+ i,F,a(x) = +oo, then A t ,F,a(x) = A; 

(b2) if vt+i,F,a(x) < +oo, then A ttF ,a(x) is compact; 

(iv) for an (N + 1 f horizon problem with the terminal value function F, there exists a Markov optimal 
optimal policy (</> o,..., 4>n) and if, for an (N +\)-horizon Markov policy (cf> o,..., 0 jv) the inclusions 
4>]\r-t(x ) G Afyp a (x), x€X,t = 0,...,N, hold then this policy is (N + 1 )-horizon optimal; 

(v) if F(x) < v a (x) for all x G X, then Vt t F,a(x ) —> v a (x ) as n —»• +oo for all i£X; 

(vi) if the cost function c is inf-compact, then each of the functions Vlf.o-, t = 1,2,..., is inf-compact. 


Proof. Statements (i)-(iv) are corollaries from statements (i)-(iii) of Theorem 3.4. Indeed, the statements 
of Theorem 3.4, that deal with the finite horizon N, hold when one-step costs at different time epochs vary. 
In particular, if the one-step cost at epoch t = 0,1,..., N is defined by a bounded below, measurable cost 
function q rather than by the function c. This case can be reduced to the single function c by replacing 
the state space X with the state space X x {0,1,... ,N}, setting c((x,t),a) = ct(x,a), and applying 
the corresponding statements of Theorem 3.4. In our case, ct(x, a) = c(x. a) for t = 0,1,..., N, and 
cn(x, a) = c(x, a) + F(y)q(dy\x, a). The function cm is bounded below and lower semi-continuous. 

To prove (v) and (vi), consider first the case when the functions c and F are nonnegative. In this case, 

Vt,a(x ) < v t , F>a {x) < v t , Va ,a(x) = v a (x), X G X, t = 0, 1, . . . . (3.10) 

Therefore, for nonnegative cost functions, Statement (v) follows from Theorem 3.4(i). Statement (vi) fol¬ 
lows from (v), Theorem 3.4(viii), and the fact that v^ F ,a > v t,a since F is nonnegative. In a general case, 
consider a finite positive constant K such that the functions c and F are bounded below by (-K). If the 
cost functions c and F are increased by K, then the new cost functions are nonnegative, each finite-horizon 
value function vt t F,a is increased by the constant dt = K{ 1 — a t+l )/{ 1 — a), and the infinite-horizon value 
function v a is increased by the constant d = K/(l — a). Since dt < d and d/ d as t oo. the general 
case follows from the case of non-negative cost functions. ■ 

While Theorem 3.4 and Corollary 3.5 state the convergence of value functions and describe the structure 
of optimal sets of actions, the following theorem describes convergence properties of optimal actions. For 

i£X and a G [0,1), define the sets (x) := {a G A : c(x, a) < v a (x)}. 
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Theorem 3.6 Let Assumption W* hold and a £ [0,1). Suppose F : X —>• K is bounded below, lower 
semi-continuous, and such that for all x £ X 

F(x) < v a (x) and v\,F,a{x) > F(x). (3.11) 

For i£l, such that v a (x) < oo, the following two statements hold: 

(i) the set D^(x) is compact, and Af t F,a(x) Q D* a {x) for all t = 1,2,..., where the sets A t ,F,a{x) are 
defined in Corollary 3.5(iii); 

(ii) each sequence {a^' £ At t F,a(x), t = 1,2,...} is bounded, and all its limit points belong to A a (x). 

In particular, if c(x, a) > 0 for all x £ X, a £ A, then the function F(.r) = 0 satisfies conditions (3.11). 
In order to prove Theorem 3.6, we need the following lemma, which is a simplified version of [21, Lemma 
4.6.6], 

Lemma 3.7 Let A be a compact subset of A and f. f n : A —> M, n = 1,2 ,..., be nonnegative, lower 
semi-continuous, real-valued functions such that f n (a) t f( a ) as n —» oo for all a £ A. Let a ( n ) £ 
argmin agj4 / n (a), n = 1, 2,..., and a* be a limit point of the sequence {a^ n \n = 1, 2,...}. Then a* £ 
argrnin a&A f(a). 

Proof. Let a' £ argmin a&A f{a). Then f(a') > fn(a <n ' > ) > fk( a< ' n ' > ) f° r a ll n > k. Since A is compact, 
then a* £ A. Lower semi-continuity of / and the previous inequalities imply /}.( a*) < liminf ri ^ OC) / ri (a (n) ) 
< f{a!). Thus f(a') > f k {a*) | f(a*). Since f{a*) < /(a'), then a* £ argmin aeA f(a). • 

Proof of Theorem 3.6. We assume without loss of generality that the bounded below functions c and F 
are nonnegative. We can do this because of the arguments provided at the end of the proof of Corollary 3.5 
and the additional argument that, if the one-step cost functions c and terminal cost functions are shifted by 
constants, then the set of optimal Unite-horizon action A/ F.a}-) and infinite-horizon actions A n (■) remain 
unchanged. 

Fix x £ X. Since the function v t ^\n is nonnegative and, in view of (3.10), 'f’/.+ i .F.«(x:) < v a (x), 

At,F,a(x ) = {a £ A : c{x, a) + a v t ,F,a(y)q(dy\x, a) = ^ + i )F ,a(a:)} C D* a {x), t= 1,2,... . 

Jx 

Statement (i) is proved. Since D* (x) is compact, every sequence {a^ £ A t ,F,a(x)}t=i y 2 ,... is bounded and 
has a limit point. The theorem follows from Lemma 3.7 applied to the set A := A>* (x) and functions 

f(a) = c{x,a) + a / v a (y)q(dy\x,a), 

Jx 

ft(a) = c(x,a) + a / v t ,F,a(y)q(dy\x, a), 

Jx 

To verify the conditions of Lemma 3.7, observe that for all 2 € X 

V a (z) = Vt, Vat a(z) > V t ,F,a(z ) > V t , a (z) f V a (z), 


CL £ A, 

a £ A, t = 0,1,... . 


10 


where the first equality follows from the optimality equation, the first and the second inequalities follow 
from v a (0 > F(-) > 0, and the convergence is stated in Theorem 3.4(i); this convergence is monotone 
because c and F are nonnegative functions. The inequality vl.f.qO) > F(-) in (3.11), equality (3.9), and 
standard induction arguments imply vt+i,F,a( - ) > vt, F,a(')> £ = 0,1,... . Thus Assumption (3.11) implies 
that vt,F,a t v n ■ and the monotone convergence theorem implies f) j / as t oc. ■ 


4 Average-Cost MDPs with Borel State and Action Sets 

The average cost case is more subtle than the case of expected total discounted costs. The following assump¬ 
tion was introduced by Schal [30]. Without this assumption the problem is trivial because w(x) = +oo for 
all i£X, and therefore every policy is optimal. 

Assumption G. w* := inf w(x) < +oo. 

xEX 

Assumption G is equivalent to the existence of x £ X and tt £ II with vf (x) < oo. Define the following 
quantities for ct £ [0,1): 

m a = inf v a (x), u a (x) = v a (x) — m a , 
xex 

w = liminf(l — a)m a , W = limsup(l — a)m a . 

"ft aft 

Observe that u a (x) > 0 for all x £ X. According to Schal [30, Lemma 1.2], Assumption G implies 

0 < w<w < w* < +oo. (4.1) 

Moreover, Schal [30, Proposition 1.3], states that, if there exist a measurable function u : X —> M + , where 
M + := [0, +oo), and a stationary policy <i> such that 

w + u(x) > c(x, 4>(x)) + 1 u(y)q(dy\x,<l>(x)), xgX, (4.2) 

then <p is average cost optimal and w(x) = w* for all x £ X. The following condition plays an important 
role for the validity of (4.2). 

Assumption B. Assumption G holds and sup a6 r 01 ) u a (x) < oo for all x £ X. 

We note that the second part of Assumption B is Condition B in Schal [30]. Thus, under Assumption G, 
which is assumed throughout [30], Assumption B is equivalent to Condition B in [30]. 

For x £ X and for a nonnegative lower semi-continuous function u : X —> R + , define the set 

A* u (x) := |a £ A : w + u{x) > c(x, a) + L u(y)q(dy\x,a) \ , x £ X. (4.3) 


il 


A stationary policy f satisfies (4.2) if and only if A* (x) f 0 and f(x) £ A*(x) for all x £ X. 

Following Feinberg et al. [14, Formula (21)], define 

u(x) := liminf u a (y), x £ X. (4.4) 

(t/jdO-K®.!-) 

In words, u(x) is the largest number such that u(x) < limin^^oo u an (y n ) for all sequences {y n ,n > 1} 
and {a n , n > 1} such that y n —>• x and a n —> 1. 

Following Schal [30, Page 166], where the notation w is used instead of u, and Feinberg et al. [14, 
Formula (38)], for a particular sequence a n -» 1—, define 

u(x) := liminf u an (y), x £ X. (4.5) 

(y,n)—>(x,oo) 

In words, u(x) is the largest number such that u(x) < lim u Qri (y n ) for all sequences {y n ,n > 1} 

converging to x. 

It follows from these definitions that u(x) < u(x), x £ X. However, the questions, whether u = u 
and whether the values of u depend on a particular choice of the sequence a n . have not been investigated. 
If Assumption B holds, then u(x) < +oo, x £ X. IF Assumption B holds and the cost function c is inf- 
compact, then the functions v a , u, and u are inf-compact as well; see Theorem 3.4(i) for this fact for v a and 
Feinberg et al. [14, Theorem 4(e) and Corollary 2] for u and u. 

Theorem 4.1 (Feinberg et al. [14, Theorem 4 and Corollary 2]). Suppose Assumptions W* and B hold. The 
following two properties hold for the function u defined in (4.4) and for u = u, where u is defined in (4.5) 
for a sequence {ot n , n > 1} such that a n f 1 : 

(a) for each x £ X the set A* (x) is nonempty and compact; 

(b) the graph Grx(^4„) = {{x, a) : x £ X, a £ A* (x)} is a Borel subset o/X x A. 

Furthermore, the following statements hold: 

(i) there exists a stationary policy (b satisfying (4.2); 

(ii) every policy (p satisfying (4.2) is optimal for the average cost per unit time criterion, and 


1 


N -1 


w^(x) = u>(x) = w = w = w = hm(l — a)v a (x) = lim — c(xt, at), x £ X. 

atl N^roo N ^' 

1 t=0 


(4.6) 


If the one-step cost function c is inf-compact, the minima of functions v a possess additional properties. 
Set 


X a := {x £ X : v a (x ) = m Q }, a £ [0,1). (4.7) 

Since X a = {x £ X : v a (x) < m a }, this set is closed if Assumptions G and W* hold. If the function c is 
inf-compact then inf-compactness of v a implies that the sets X a are nonempty and compact. The following 
fact is useful for verifying the validity of Assumption B; see Feinberg and Lewis [17, Lemma 5.1] and the 
references therein. 
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Theorem 4.2 (Feinberg et al. [14, Theorem 6]). Let Assumptions G and W* hold. If the function c is 
inf-compact, then there exists a compact set K, CX such that X a C K, for all a G [0,1). 

According to Feinberg et al. [14, Theorem 5 and Corollary 3], certain average cost optimal policies 
can be approximated by discount optimal policies with a vanishing discount factor. The following theorem 
describes particular constructions of such approximations. Recall that, for the function u(x) defined in 
(4.4), for each ieX there exist sequences {a n , n > 1} and {x^ n \ n > 1} such that a n f 1 and x—> x , 
where G X, such that u(x) = lini,,^^ u Qri (x l ' n> ). Similarly, for a sequence { a n , n > 1} such that 
a„ | 1 consider the function u dehned in (4.5). For each x G X there exist a sequence {x^ n \n > 1} 
of points in X converging to x and a subsequence { a *, n > 1} of the sequence {a n ,n > 1} such that 
u(x) = lim n _> 00 u a ;(i (n) ). 

Theorem 4.3 Let Assumptions W* and B hold. For x G X and a* G A, the following two statements hold: 

(i) Consider a sequence {(x^ n \a n ),n > f} with 0 < a n t 1, x^ G X, x^ —> x, and u 0n (x ^) —>• 

u(x) as n —> oo. If there are sequences of natural numbers {rik,k > 1} and actions {a( nk ' ) G 
A ar i k (x( n *l), k > 1}, such that —> oo and —>• a* as k —> oo, then a* G A* (x), where the 

function u is defined in (4.4); 

(ii) Suppose {a n ,n > 1} is a sequence of discount factors such that a. n | 1. Let {a*. n > 1} be its 
subsequence and {x^ n \n > 1 } be a sequence of states such that x^ —> x and u a * (x^ n ' 1 ) —> u(x) 
as n -X oo, where the function u is defined in {4.5) for the sequence {a n , n > f}. If there are actions 
a ( n ) g A a * (x^) such that a^ —> a* as n —> oo, then a* G A~(x). 

Proof. To show (i), consider sequences whose existence is assumed in the theorem. We have 

v ank (x^) = c(x^\a^)+a f v ank (y)q(dy\x^\a^). 

J X 

This implies (with a little algebra) 

u ank (x {nk] ) + (l - a nk )m ank = c(x {nk \a {nk) ) + a f u arik (y)q(dy\x { - nk \a^ nk) ). 

J X 

Fatou’s lemma for weakly converging measures (see e.g., Feinberg et al. [16, Theorem 1.1]), the choice of 
the sequence x (n F. an d Theorem 4.1 yield 

u{y)q{dy\x,a*). 

Thus a* G A* (x). The proof of Statement (ii) is similar. ■ 

Corollary 4.4 Let Assumptions W* and B hold. For x <G X and a* G A, the following hold: 

(i) if each sequence {(a*,x^),n > 1} with 0 < a* n t 1, x^ G X, and x^ —>• x, contains a 
subsequence (a nk , x^A), such that there exist actions G A Qnfc (x( nfc )) satisfying —y a* as 
k —y oo, then a* G A* (x) with the function u defined in (4.4); 

(ii) if there exists a sequence { a n . n > 1} such that a„ f 1 and for every sequence of states {x n —>■ x } 
from X there are actions a n G A ari (x^ n ' > ), n = 1,2,... , satisfying a n —» a* as n —y oo, then 
a* G At (x), where the function u is defined in (4.5) for the sequence {a n , n > 1}. 


w + u(x) > c(x, a*) + / 

Jx 
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Proof. Statement (i) follows from Theorem 4.3(i) applied to a sequence {(a*, x n > 1} with the 
property u(x) = u n * (x' n ' > ). Statement (ii) follows from Theorem 4.3(ii) applied to a sequence 

{x( n \ n > 1} and a subsequence {a*, n > 1} of {a n ,n > 1} such that u(x) = lim^oo u a *Jx^). • 

The following theorem is useful for proving asymptotic properties of optimal actions for discounted 
problems when the discount factor tends to 1. 

Theorem 4.5 Let Assumptions W* and B hold. For x £ X the following hold: 

(i) There exists a compact set D*(x) C A such that A a (x) C D*(x)for all a £ [0,1); 

(ii) If {a n ,n > 1} is a sequence of discount factors a n £ [0,1), then every sequence of infinite-horizon 
a n -discount cost optimal actions {a^ n \n > 1}, where a^ £ A an (x), is bounded and therefore has 
a limit point a* £ A. 


Proof. For each x, the set of optimal actions A n (x) in state x does not change if a constant is added to the 
cost function c. Therefore, we assume without loss of generality that the cost function c is nonnegative. Fix 
itX and e* > 0. Since x is fixed, we sometimes omit it. For a £ [0,1) and a £ A define 


U(x) : = sup u a (x), 

<*g[ 0,1) 


fa(a) ■= c(x, a) + a / v Q (y)q(dy\x,a), 

Jx 

g a (a) :=c(x,a) + a / u a (y)q(dy\x,a). 

Jx 

Observe that g a (a) = f a (a) — am a and 

A a (x) = |a £ A : / a (a) = mm/ a (6)| = ja G A : g a (a) 


mm g a 
fee A 


Assumption B implies that U(x) < +oo, and Theorem 4.1 implies that iim r ,ji (1 — a)m a = w*. As 
shown in Feinberg et al. [14, the first displayed formula on p. 602], there exists op € [0,1) such that for 

ot £ [ap, 1) 5 


w* + e* + U(x) > (1 — ot)m a + u a (x) = min g a (a), 

ag A 

This implies that for a £ [ao, 1) 


A a {x) C V ga ( Ai; A) C D 30 (Ai; A), 

where the definition of the level sets £>.(•; •) is given in (3.5), Ai := w*+e* + U(x), and the second inclusion 
holds because the function u Q takes nonnegative values. Recall that fo(a) = go (a) = c(x, a), a £ A, and 
the function c(x, •) : A — > R is inf-compact. Therefore, T>f 0 ( A; A) = T> go ( A; A) and this set is compact for 
each A £ R. In addition, for all a € [0, ao), 

v ao {x) > v a (x) = min f a (a), 

ag A 
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where the inequality holds because one-step costs c are nonnegative. The equality is simply the optimality 
equation (3.8). This implies that for a £ [0, qq) 


A a {x ) C V fa (v ao (x)-A) C V fo {v ao {x)\ A). 


Let D* (x) := V go (Ai; A) U V f 0 (v ao (x ); A). This set is compact as it is the union of two compact sets, and 
A a (x) C ])*{x) for all a £ [0,1). Statement (i) is proved, and it implies Statement (ii). ■ 

5 MDPs Defined by Stochastic Equations 

Let § be a metric space, £>(§) be its Borel a-field, and /j be a probability measure on (§, £>(§)). Consider a 
stochastic sequence {xt, t > 0} whose dynamics are defined by the stochastic equation 


xt+i = f(x t ,a t ,£ t +i), t = 0 , 1 ,..., 


(5.1) 


where {C, t > 1} are independent and identically distributed random variables with values in §, whose 
distributions are defined by the probability measure /i, and f :XxAxS->Iisa continuous mapping. 
This equation defines the transition probability 

q(B\x,a) = fl{{(x,a,s) £ B} f i(ds), B £ £>(X), (5.2) 

J § 

from X x A to X, where I is the indicator function. 

Lemma 5.1 The transition probability q is weakly continuous in (x, a) £ X x A. 

Proof. For a closed subset B of X and for two sequences x (n) x 

and a^ — X a as n —> +oo defined on 

X and A respectively, 



where the first inequality follows from Fatou’s lemma and the second follows from (5.2) and upper semi¬ 


continuity of the function I{f(x, a, s ) £ B} on X x A x § for a closed set B. The weak continuity of q 


follows from Billingsley [6, Theorem 2.1] 


Corollary 5.2 Consider an MDP {X. A. q. c} with the transition function q defined in (5.2) for the con¬ 
tinuous function f introduced in (5.1) and with the nonnegative 'K-inf compact cost function c. This MDP 
satisfies Assumption W* and therefore the conclusions of Theorem 3.4 hold. 
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Proof. Assumption W*(i) is assumed in the corollary. Assumption W*(ii) holds in view of Lemma 5.1. ■ 

For inventory control problems, MDPs are usually defined by particular forms of (5.1). In addition, the 
cost function c has the form 


c(x,a) = C(a) + H(x,a), (5.3) 

where C(a) is the ordering cost and H (x. a) is either holding/backordering cost or expected holding/back¬ 
ordering cost at the following step. For simplicity we assume that the functions take nonnegative values. 
These functions are typically inf-compact. If C is lower semi-continuous and H is inf-compact, then c is 
inf-compact because C is lower semi-continuous as a function of two variables x £ X and a £ A, and a sum 
of a non-negative lower semi-continuous function and an inf-compact function is an inf-compact function. 
However, as stated in the following theorem, for discounted problems the validity of Assumption W* and 
therefore the validity of the optimality equations, existence of optimal policies, and convergence of value 
iteration take place even under weaker assumptions on the functions C(a ) and II(x. a). 

Theorem 5.3 Consider an MDP {X, A, q, c} with the transition function q defined in (5.2) and cost function 
c defined in (5.3). If either of the following two assumptions holds: 

(1) the function C : A —> [0, oo] is lower semi-continuous and the function H :IxA-} [0, oo] is 
K-m/ -compact; 

(2) the function C : A —y [0, oo] is inf-compact and the function H : X x A —> [0, oo] is lower semi- 
continuous; 

then Assumption W* holds and therefore the conclusions of Theorems 3.4(i)-(vii), 3.6 and Corollary 3.5(i)- 
(v) hold. Furthermore, if either of the following two assumptions holds: 

(i) the function C : A —> [0, oo] is lower semi-continuous and the function H :XxA-> [0, oo] is 
inf-compact; 

(ii) the function C : A —»• [0, oo] is inf-compact and the function H* : A x X —> [0, oo] is K- inf-compact; 
where H*(a, x) := H(x , a) for all x G X and all a £ A; 

then the function c is inf-compact and therefore the conclusions of Theorems 3.4, 3.6 and Corollary 3.5 hold. 

Proof. Lemma 5.1 implies the weak continuity of the transition function q. The definition of a ]K-inf- 
compact function implies directly that the function C*(x,a) := C(a ) is K-inf-compact on X x A, if the 
function C : A —x [0, oo] is inf-compact. Thus under assumptions (1) or (2), c is a K-inf-compact function 
because it is a sum of a nonnegative lower semi-continuous function and a K-inf-compact function. In 
addition, under assumption (i), as explained in the paragraph preceding the formulation of the theorem, the 
one-step cost function c is inf-compact. Under assumption (ii), the function c is inf-compact because of the 
following arguments. Let c*(a, x) := C(a) + H*(a, x) for all (a, x) £ A x X. The function c* : A x X -> 
[0, oo] is lower semi-continuous if either assumption (1) or assumption (2) holds. Since c(x, a) = c*(a , x) 
for all x £ X and a £ A, the function c : X x A —> [0, oo] is inf-compact if and only is the function 
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c* : A x X -> [0, oo] is inf-compact. The function c* is a sum of the nonnegative lower semi-continuous 
function C and the IK-inf-compact function //*. Therefore, c* is IK-in('-compact. Consider an arbitrary 
A € R. Since c*(a,x) > C(a ) > A for a f. T>c{ A; A), then T> C »(A; A x X) = V c * (A; X>c(A; A) x X), 
and this set is compact because the set T>c(X\ A) is compact and the function c* is K-inl'-compact. Thus the 
functions c* and c are inf-compact. ■ 

Remark 5.4 In view of Theorem 3.4, Assumption W* implies the existence of optimal policies for the ex¬ 
pected total discounted cost criterion. It is also possible to derive sufficient conditions for the validity of 
Assumptions G and B and therefore for the existence of stationary optimal policies for the average costs 
per unit time criterion. However, this is more subtle than for Assumption W* and in this paper we verify 
Assumptions G and B directly for the periodic review inventory control problems. 

6 Optimality of (s, S) Policies for Setup-Cost Inventory Control Problems 

In this section we consider a discrete-time periodic-review inventory control problem with backorders, prove 
the existence of an optimal (s, S ) policy, and establish several relevant results. For this problem, the state 
space is X := M, the action space is A := M + , and the dynamics are defined by the following stochastic 
equation 


xt+i = x t + at - A+i, t = 0,1,2,..., (6.1) 

where xt is the inventory at the end of period t, at is the decision on how much should be ordered, and D t is 
the demand during period t. The demand is assumed to be i.i.d. In other words, if we change the notation ft 
to D t+ 1 , the dynamics are defined by equation (5.1) with f (x, a, D) = x + a — D. Of course, this function 
is continuous. 

The model has the following decision-making scenario: a decision-maker views the current inventory 
of a single commodity and makes an ordering decision. Assuming zero lead times, the products are im¬ 
mediately available to meet demand. Demand is then realized, the decision-maker views the remaining 
inventory, and the process continues. Assume the unmet demand is backlogged and the cost of inventory 
held or backlogged (negative inventory) is modeled as a convex function. The demand and the order quantity 
are assumed to be non-negative. The dynamics of the system are defined by (6.1). Let 

• a G (0,1) be the discount factor, 

• K > 0 be a fixed ordering cost, 

• c > 0 be the per unit ordering cost, 

• D be a nonnegative random variable with the same distribution as D t , 

• hf) denote the holding/backordering cost per period. It is assumed that h : R -A IK' 1 ' is a convex 
function, h(x) -A oo as |x| -A oo, and E h(x — D) < oo for all x £ R. 
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Note that ED < oo since, in view of Jensen’s inequality, h(x — E D) < E h(x — D) < oo. Without loss 
of generality, assume that h is nonnegative, h( 0) = 0, and h(x) > 0 for x < 0. Otherwise, let x* £ R be 
a point, at which the function h reaches its minimum value on R. Define the variable x := x — x* and the 
function h(x) := h(x + x*) — h(x*), i£l. Then h is a nonnegative convex function with h(x) -X oc as 
\x\ —>• oo, h( 0) = 0, and h(x) > 0 for x < 0. 

The cost function c for this model is defined in (5.3) with with C(a ) := Tfl| a>0 | + ca and H(x, a) := 
E h(x + a — D). The function C : A —y M + is inf-compact. In fact, it is continuous at a > 0 and lower semi- 
continuous at a = 0. The function H* : A x X —> M + , where H*(a, x) := H(x, a) for all (a, x) S A x X, is 
JK-inf-compact because of the properties of the function h. Theorem 5.3 (case (ii)) implies that the function 
c is inf-compact. Therefore, in view of Proposition 3.2, the function c is IK-inf-compact. 

The problem is posed with X = M and A = M + . However, if the demand and action sets are integer or 
lattice, the model can be restated with X = Z, where Z is the set of integer numbers, and A = {0,1,.. 
see Remark 6.14. 

Consider the following corollary from Theorems 3.4, 3.6, and 5.3. 

Corollary 6.1 For the inventory control model, Assumption IT* holds and the one-step cost function c is 
inf-compact. Therefore, the conclusions of Theorems 3.4, 3.6 and Corollary 3.5 hold. 

Proof. The validity of Assumption W* and inf-compactness of c follow from Theorem 5.3 (case (ii)). ■ 
Consider the renewal process 


N(t) := supjnlSvi < f}, 


( 6 . 2 ) 


where t £ R + , So = 0 and S n = V'K ( Dj for n > 0. Observe that, if P(D > 0) > 0, then EN(i) < oo 
for each t £ R + ; Resnick [27, Theorem 3.3.1]. Thus, Wald’s identity yields that for all y £ R + 


es n(o;)+i = E(N(y) + 1)EP< Too. 


(6.3) 


We next state a useful lemma. 


Lemma 6.2 For fixed initial state x, if P(D > 0) > 0, then 


E y (x ) := Eh(x - S N ( y ) + i) < Too, 


(6.4) 


where 0 < y < Too. 


Proof. Define 



h(x) for x < 0, 
0 for x > 0. 


Observe that it suffices to show that 


E*(x) := E h*(x - S N (j) + i) < Too. 


(6.5) 
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Indeed, for Z = x — S N(, y ) +1 , 


E y (x) = E \{Z < 0}h*(Z) + E 1{Z > 0 }h(Z) < E*(x) + h{x). 

To show that E*(x) < +oo, we shall prove that 

E h*(x-S N{y)+1 ) < (l + EN(j/))E/i*(x-t/-T) < +oo. (6.6) 


Define the function f(z ) = h*(x — y — z). This function is nondecreasing and convex. Since / is convex, 
its derivative exists almost everywhere. Denote the excess of N(y) by R(y) := 5’Ni'yj+i — y. According to 
Gut [20, p. 59], for t, y € R + 

P{R(y) > t} = 1 - F D (y+ t) + f (1 - F D (y + t - s))dU(s), 

J o 

where Ffj is the distribution function of D and U (s) = EN(s) is the renewal function. Thus, 

AOO 

Eh*(x-S N{y)+1 )=Kh*(x-y-R(y))=Ef(R(y))= / f'(t)P{R(y) > t}dt = R + J 2 , (6.7) 

Jo 

where R = / 0 °° f(t)( 1 - F D (y + f))df, J 2 = / 0 °° /'(f) (/^(l - A D (y + f - s))df/(s)) dt, and the third 
equality in (6.7) holds according to Feinberg [10, p. 263]. Note that since Fo is non-decreasing, 


AOO 

Ji< f'(t)(l-F D (t))dt = Ef(D) = Eh*(x-y-D)<Eh(x-y-D)<+oo, 
Jo 

where the first equality follows from [10, p. 263]. Similarly, by applying Fubini’s theorem, 

ry / f-o o \ 

Ji = J iyj /'(*)(! - F D (y + t-s))dt\ dU(s) 

< £ (J f(t)( 1 - F D (t))dt^J dU(s) = E f(D) E U(y) = Kh*(x-y-D)F N (y). 

Combining (6.7)-(6.9) yields (6.6). 


( 6 . 8 ) 


(6.9) 


The following proposition is useful for the average-cost criterion. In addition to this proposition, observe 
that the case D = 0 almost surely is trivial for this criterion. In this case, the policy </>, ordering up to the 
level 0, if x < 0, and doing nothing otherwise, is average-cost optimal. For this policy w(x) = w 0 (x) = 0, 
if x < 0, and w(x) = w^(x) = h(x), if x > 0. Observe that cj) is the (0, 0) policy. Since w(x) depends on 
x , then Theorem 4.1 implies that Assumption B does not hold when D = 0 almost surely. 


Proposition 6.3 The inventory control model satisfies Assumption G and, therefore, the conclusions of The¬ 
orem 4.2 hold. Furthermore, if P ( I) > 0) > 0, then Assumption B is satisfied and the conclusions of 
Theorems 4.1, 4.3 and 4.5 hold. 
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Proof. Consider the policy 0 that orders up to the level 0, if the inventory level is less than 0, and does 

nothing otherwise. Then w^(0) = KP(D > 0) + cED + IE h(—D) < +oo. That is, Assumption G holds. 

In view of Corollary 6.1, Theorem 3.4 implies that for every discount factor a £ [0,1) there exists a 
stationary discount-optimal policy 4> a . Theorem 4.2 implies that U ag [o.i)X a C /C for some K. C R. Let 
[x* L , Xij] be a bounded interval in R such that /C C [x* l ,Xij\. Thus, 

U aG[0,l)^a != \ x *Li x u\- 

For a discount factor a £ [0,1), fix a stationary optimal policy <\> a and a state x° £ \x* L ,Xfj\ such that 
v a (x a ) = m a . Observe that <f> a (x a ) = 0. Indeed, let <f> a {x a ) = a > 0. We have 

v a (x a ) = K + ca + h(x a + a — D) + aEv a (x a + a — D) 

> K + + h((x a + -) + - — D) + aEv a ((x a + -) + - — D) > v a (x a + -), 

where the second inequality follows since the optimal action in state x a + j may not be to order The 
inequality v a (x a ) > v a (x a + |) contradicts v a (x a ) = m a . 

Let a be the policy defined by the following rules depending on the initial state x : (i) if x < x a , then at 
the initial time instance a orders up to a level x a and then switches to 0", and (ii) ifx > x n . the policy a 
does not order as long as the inventory level is greater than or equal to x a and starting from the time, when 
the inventory level becomes smaller than to x a , the policy er behaves as described in (i) starting from time 
0. 

For x < x a , 

v a( x ) = K + c(x a — x) + v a (x a ) < K + c{x*u — x) + rn a . (6.10) 

The inequality in (6.10) yields for x < x a , 

v a (x) — m a < v°(x) — m a < K + c(x^- — x) < +oo. (6.11) 


For x > x a , 

N(i— x at )+l 

V a (x) < v°(x) = E [ ^2 a^hixt)+o N< ' x ~ x ° l ' >+1 [K+ c(x a - x N( ^ x _ xa ) +1 )+ v a (x a )}}. (6.12) 

t= l 

Let E(x) := h(x) + E x _ x * l (x) < oo, where the function E y (x ) is dehned in (6.4) and its hniteness is stated 
in Lemma 6.2. Since the nonnegative function h is convex, then for xy = x — S t,t = 1,..., N(x — x* L ) + 1, 

0 < h(x t ) < max{/i(x - S N ( x _a..) +1 ), h{x)} < h(x) + h(x - S N(x _ x *) +1 ) (6.13) 

and 

E h(x t ) < h(x) +E h(x- S N(x _ x . )+1 ) = E(x). (6.14) 

Observe that 

N(x—a:“)+l N(a:-x*)+l 

E[ J2 at ~ l h{x t )\ < E [ J2 h(xt)]<E(x){ 1+ EN(x - x* L )), (6.15) 

t =t t=i 
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m c 


(6.16) 


where the first inequality follows from x* L < x a and a € [0,1); the second inequality follows from x* L < x a , 
(6.13),(6.14), and Wald's identity. In addition, 

E[a N ( I - 2;Q ) +1 [K+c(x a - %(!-!«)+!) + v a (x a )] < K + c(x a - x + E S N ( 3 ._ 3 . Q)+1 ) + : 

< K + c(l + EN(i — x* L )) ED + m a , 

where the first inequality follows from a € [0,1), xt = x — St, and v n (x°) = m a ; the second inequality 
follows from x > x a > x* L and Wald’s identity. Formulae (6.12), (6.15), and (6.16) imply that for x > x a 

v a (x ) — m a < K + ( E(x ) + cED)(l + EN(i — x * L )) < +oo. (6.17) 


Inequalities (6.1 1) and (6.17) imply that Assumption B holds. ■ 

Consider a nonnegative, real-valued, lower semi-continuous terminal value F. In view of Corollaries 3.5, 
6.1, Theorems 3.4, 4.1, and Proposition 6.3, equations (3.9), (3.8) and, for the case P(D > 0) > 0, 
inequality (4.2) can be rewritten as 


v t +i,F,a( x ) = min{min[76 + G t ,F,a{x + «)]> G t , F , a {x)} - cx, 

a> 0 

v a (x) = min{rniri[76 + G a (x + a)], G a (x)} — cx, 

a>0 

w + u(x) > min{rnin[76 + H{x + a)], H(x)} — cx, 

a> 0 


(6.18) 

(6.19) 

( 6 . 20 ) 


where t = 0,1,... and w := w{x) = w* = w = w, x £ X, and the last three equalities hold in view of 
(4.6), and 


Gt,F,a{x) 

= cx + E h(x — D) + aKvt,F,a(x — D), 

(6.21) 

G a (x) 

= cx + E h(x — D) + aE v a (x — D), 

(6.22) 

H(x) 

= cx + E h(x — D) + E u(x — D ). 

(6.23) 


We explain the correctness of (6.18). The explanations for (6.19) and (6.20) are similar. For this par¬ 
ticular problem, optimality equation (3.9) is equivalent to vt+i,F,a(x) = min{inf a> o[/v + + 

a)]. 6'/.F. f >(.x)} — cx, and the internal infimum can be replaced with the minimum in (6.18) because of the 
following two arguments: 

(i) the function K + Gt : F,a(y) is lower semi-continuous on [x, oo) and Gt : F,a(y ) —> oo as y —> oo, and 

(ii) K + Gt,F,a(x) > Gt, F,a(^) since K > 0. 

We remark that, in general, while equations (6.18) and (6.19) are the necessary and sufficient conditions of 
optimality, inequality (6.20) is the sufficient condition of optimality. Also, if P(D = 0) = 1, then inequality 
(6.20) does not hold because w(x) is not a constant function, as explained before Proposition 6.3. 

Corollary 6.4 Let a € [0,1). The following statements hold: 

(a) the function G a (-) is lower semi-continuous, 


(b) if F is nonnegative, real-valued, and lower semi-continuous, then the functions {Gt,F,a{')}t=o;i,... are 
lower semi-continuous, and 


(c) if P{ I) > 0) > 0, then H is lower semi-continuous. 
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Proof. In view of (6.21)-(6.23), each of these functions is a sum of several functions, two of which are 
continuous and the third one is lower semi-continuous, as follows from Corollary 6.1 and from Proposi¬ 
tion 6.3. ■ 


Lemma 6.5 Let a G [0,1). Then G a (x ) < +oo for all x € X. Furthermore, if 0 < F(x) < v a [x) for all 
x G X, then G a ^,tfx) < +oo for all x G X and for all t = 0,1,... . 


Proof. Since G n _p.t < G a , in view of (6.22), the lemma follows from Ev a (x — D) < +oo. To prove this 
inequality, consider the policy <f that orders up to the level 0 if the inventory level is non-positive and orders 
nothing otherwise. For x < 0 


v a (x) < v^(x) < K — cx + 


a(K + cED + Eh(-D)) 
1 — a 


(6.24) 


Letting B a := AA-KED+E/t( D)) , wg jj ave — D) < K — cE(s — D) + B a < +oo. For x > 0, 


v a (x) < v£(x) = E 


N(a:)+1 

Y ofh(x - S t ) + d N ( 1 )+ 1 { ) ^ a ; - S N(a . )+1 ) 


t =t 


< h(x) E N(x) + E h(x — Sn^^!) + K — c(x — E S N ( 3 ,) +1 ) + B a < +oo, 

where the second inequality follows from the facts that of < 1 for t > 1 , 0 < h(x — S t ) < h(x) for 
t = 1,..., N(x), and (6.24). The second inequality holds because EN(x) < oo, Lemma 6.2, and (6.3). Let 
a G (0,1). Since vt(x) = E h(x — D) + aE vt(x — D) < +oo, then E v a (x — D) < E vt(x — D) < +oo. 
In addition, Vq (x — D) < vt(x — D) < +oo. The result follows. ■ 


Recall the following classic definition. 

Definition 6.6 For a real number K > 0, a function f : M —> M is called K-convex, if for each x < y and 
for each A G (0,1), 


/((1 - A)x + Ay) < (1 - A )f(x) + A/(y) + AAA 


The following lemma summarizes some properties of A'-convex functions. 


Lemma 6.7 The following statements hold for a K-convex function g : M —> M : 

1. If the function g is measurable and D is a random variable, then E g(x—D) is also K-convex provided 
E | g(x — D) | < oo for all iGt 


2. Suppose g is inf-compact (that is, lower semi-continuous and g(x) -X oc as \x\ -V oc). Let 

S G argmin 3 ;gR {y(x)}, (6.25) 

s = infja; < S \ g(x) < K + y(<S')}. (6.26) 

Then 


(a) g(S) + K < g{x) for all x < s, 

(b) g(x) is decreasing on (—oo, s\ and, therefore, g(s) < g(x) for all x < s, 

(c) g(x) < g(z) + K for all x such that s < x < z, 
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Proof. See Bertsekas [2, Lemma 4.2.1] and Simchi-Levi et al. [32, Lemma 8.3.2] for the case of a contin¬ 
uous function g. The proofs there with minor adjustments cover the case when g satisfies the measurability 
and continuity properties stated in the lemma. ■ 

Consider the discounted cost problem and suppose G a is A'-convex, lower semi-continuous and ap¬ 
proaches infinity as |x —>- oo. If we define S a and s a by (6.25) and (6.26) with g replaced by G a , Statement 
2(a) of Lemma 6.7, along with the optimality equation (6.19), imply that it is optimal to order up to S a when 
x < s a . Statement 2(c) of Lemma 6.7 imply that it is optimal not to order when x > s a . Our next goal is 
the established these properties of the function G a and of some relevant functions. 

For a fixed ordering cost K > 0 we sometimes write +, vf a , + F , G^. and Gjf , instead of v a , 
vt,ai rv.F.a:, G a , and Gt. f,«, respectively. Consider the terminal value F(x) = + (x), x £ X. According 
to Theorem 3.4(viii) and Corollary 3.5(vi), the functions v a . v ®, vt Xi . and v Lv <> Q . t = 1, 2,... , are inf- 
compact. 

Lemma 6.8 The following statements hold: 

(i) the functions v a and v tv o a , t = 0,1,... , are inf-compact; 

(ii) the functions G a and G t v o t = 0. 1.... . are lower semi-continuous, and 

lim G a (x) = lim G t „o Q (x) = +oo, t = 0,1,...; 

£—>•+00 X —>+00 ’ ‘ X ’ 

(Hi) there exists a* £ [0,1) such that G^{x) —> oo as x —> —oo for all a £ [a*, 1); 

(iv) for a £ [a*, 1), where a* is the constant a* £ [0,1) whose existence is stated in Statement (iii), 
the functions G a (x) and G tv o a (x), t = 0,1,..., are K-convex and tend to +oo as x —> — oo, and 
therefore, in view of Statement (ii), these functions are inf-compact. Furthermore, the functions v a 
and v t v o a (x), t = 0,1,..., are K-convex. 

Proof. In view of Corollary 6.1, Statement (i) follows from Theorem 3.4(viii) and Corollary 3.5(vi). State¬ 
ment (ii) follows from Statement (i), nonnegativity of costs, and definitions (6.21) and (6.22). 

To prove Statement (iii) note that it is well-known that the function G° a is convex, where a £ [0,1). 
Indeed, the function v[j a = Ois convex. For AT = 0, equations (6.18), (6.21) and induction based on Heyman 
and Sobel [22, Proposition B-4] imply that the functions t = 1,2,... , are convex. Convergence of 
value iterations, stated in Theorem 3.4(i), implies the convexity of the functions + The convexity of G° a 
follows from (6.22). 

We show by contradiction that there exists a* £ [0,1) such that G° a is decreasing on an interval 
(—oo ,M a ] for some M a > —oo when a £ [a*, 1). Suppose this is not the case. For K = 0, (6.19) 
can be written as 

v a( x ) = inf {G°(x + a)} - c.x. (6.27) 

a> 0 

If a constant M a does not exist for some a £ (0,1), then the convexity of G° a {x) implies that the policy 
'll), that never orders, is optimal for the discount factor a. If there is no a* with the described property, 


23 


Corollary 4.4 implies that the policy is average-cost optimal. This is impossible because, if x is small 
enough that the convex function h(x) is decreasing at x, then w^(x) > Eh(x — D) > h(x) -X +oo 
as x —> —oo, but, in view of Theorem 4.1, w(x) is a finite constant. This contradiction implies that for 
a € [a*, 1) the functions G° a decreases when x £ (—oo,M a \, where M a > — oo. The convexity of G° a 
implies that G° a (x) —>• oo as x —>■ —oo. 

Let us prove Statement (iv). The convergence of the functions to +oo, as x —? —oo, follows from 
Statement (iii) and the inequalities G%(x) > G®(x) and Gf 0 (x) > G ( j Y (x). which hold for all x £ X. 

Indeed, the first inequality follows from (x) > u°(x), x £ X, and (6.22). The second inequality follows 
from vKg'Jx) > v ° tv o ]£t (x) = v°(x),x£ X, and (6.21). 

Now let a £ [a*, 1). As explained in the proof of (iii), the function G ° is convex and therefore it is 
A'-convex. Formulae (6.18), (6.21), Heyman and Sobel [22, Lemma 7-2, p. 312], and induction arguments 
imply that the functions G t>v o >a and v t+l v o a , t = 1, 2,... are AT-convex. In addition, v t v o , a (x) | v a (x) 
as t —> oo in view of Corollary 3.5(v) and since all the costs are nonnegative. Formulae (6.21), (6.22) and 
the monotone convergence theorem imply that G tv o a (x) f G a (x) as t —> oo. Thus, the functions v a and 
G a are A'-convex. ■ 

Definition 6.9 Let st and St be real numbers such that st < St, t = 0,1,... . Suppose X/ denotes the 
current inventory level at decision epoch t. A policy is called an ( st , St) policy at step t if it orders up to 
the level St if Xt < St and does not order when X/ : > St- A Markov policy is called an ( st , St) policy if it is 
an (st , St) policy at all steps t = 0, 1,... .A policy is called an (s, S) policy if it is stationary and it is an 
(s, S) policy at all steps t = 0,1,... . 

The following theorem is the main result of this section. 

Theorem 6.10 Consider a* £ [0,1) whose existence is stated in Lemma 6.8. The following statements hold 
for the inventory control problem. 

(i) For a £ [a*, 1) and t = 0,1,..., define g(x) := G t v o f0l (x), x £ R. Consider real numbers 

S* a satisfying (6.25) and s* io defined in (6.26). For each N = 1 . 2 ,.... the (st, St) policy with 
st = a and St = Sf r _ t _ 1 a ,t = 0,1,..., N — 1, is optimal for the N-horizon problem with 

the terminal values F(x) = v^(x), 

(ii) For the infinite-horizon expected total discounted cost criterion with a discount factor a £ [a*, 1), 
define g(x) := G a (x), x £ R. Consider reed numbers S a satisfying (6.25) and s a defined in 
(6.26). The (s Q ,S a ) policy is optimal for the discount factor a. Furthermore, a sequence of pairs 
{(sj Q , SI a )}i=o,t,... is bounded, where s^ a and S* a are described in Statement (i), t = 0,1,... .If 
(s* , S*) is a limit point of this sequence, then the (s* , S'*) policy is optimal for the infinite-horizon 
problem with the discount factor a. 

(iii) Consider the infinite-horizon average cost criterion. For each a £ [a*, 1), consider an optimal 
(s' a , S' a ) policy for the discounted cost criterion with the discount factor a, whose existence follows 
from Statement (ii). Let a n f 1, n = 1, 2,... , with a\ > a*. Every sequence {(s^, S' a ),n > 1} is 
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bounded and each its limit point (s, S) defines an average-cost optimal (s, S) policy. Furthermore, if 
P(D > 0) > 0, this policy satisfies the optimality inequality (6.20) with u = u, where the function 
u is defined in (4.5 ) for an arbitrary subsequence {ct nk }k= 1 , 2 ,... o/{a„,n > 1} satisfying (s, S) = 
lim^o 0 {s' ank ,S' ank ). 

Proof. To prove Statements (i) and (ii), let a £ [a*, 1). In view of Lemma 6.8(iv), the functions G a and 
G t v o ta , t = 0,1,..., are K -convex and inf-compact. The optimality of (st, St) policies and (s, S ) policies 
with s = s a and S = S a stated in (i) and (ii) follows from optimality equations (6.18), (6.19), Lemma 6.7 
with g = » and g = G a respectively, and Theorem 3.4. 

Consider now the remaining claims in (ii). Since (x) < G tv ,o a (x) < G t+ i iV o iCt (x) < G a (x), x £ 
R, the points s*, Q and S* n belong to the compact set {i£l: G° a (x) < K + min xe R G n (x)\. Therefore, 
the sequence {(sjf a , S)* Q )}t=is bounded and has a limit point (s*,^*). The function F(x) = v^(x) 
satisfies inequalities in (3.11), and therefore the assumptions of Theorem 3.6 hold. Theorem 3.6 implies 
that, for the infinite-horizon problem with the discount factor a, the following decisions are optimal for the 
corresponding states: no inventory should be ordered for x > s* and the inventory up to the level S* should 
be ordered for This implies that G a (x) < K + G a (S *) for x £ (s* , S*a). Lower semi-continuity 

of G a (x) implies that G a (s^) < K + Gn(S* y ). Thus, the decision, that inventory should not be ordered, is 
optimal at x = 4 . That is, the (s* , S’*) policy is optimal for the infinite-horizon problem with the discount 
factor a. 

It remains to prove Statement (iii). Let P(D > 0) > 0. We start with the proof that the sequence 
{(4„, S' an )} n =i, 2 ,... is bounded. First, we prove that the sequence {s' an ,n > 1} is bounded below. If this 
is not true, then ]im /,._ >00 s' = —oo for some oo as k -X oc. This means that for each x € R there 

n k 

is a natural number k(x) such that x > 4 nfc f° r k > k(x). Therefore, 0 £ (y), k > k(x), for all 

y > x. Corollary 4.4(ii) implies that the action 0 £ A^(y) for all y > x. where u is defined in (4.5) for the 
sequence of discount factors { a rik , k > 1}. Since x £ R is arbitrary, 0 £ A*-- (y) for all y £ R. This means 
that the policy that never orders inventory, is optimal for average costs per unit time. However, 

w^(x) > Kh(x — S n ) > h{x — nKD). 

Letting n —>• oo on the right hand side yields vA(x) = +oo for all x £ R. In view of Assumption G, that 
holds for the inventory control problem, w^(x) < +oo for some x £ E. Thus the sequence {s' an ,n > 1} is 
bounded. 

Second, we prove that the sequence {S' an ,n > 1} is also bounded. Let x £ JR be a lower bound for 
{s'a n ,n > 1}. Thus, a^ := (S' an — x) £ A an (x). In view of Theorem 4.5, the sequence {a^ n \n > 1} is 
bounded. This implies that the sequence {S' an ,n > 1} is bounded as well. 

Consider a subsequence a nk f 1 such that (s' a , S' an ) —>• (s', S') as k —>• oo. Corollary 4.4(ii) implies 
that 0 £ A^(x), if x > s ', and S' — x € A-(a:), if x < s', where the function u is defined in (4.5) for the 
sequence of discount factors {a nk , k > 1}. The last step is to prove that 0 £ A?(s'). To do this, consider a 
subsequence {cc*, n > 1 } such that a* n —>• 1 of the sequence { a nk , k > 1 } and a sequence {x^ n \n > 1 } 
with x^ —> s' such that u(s') = lim^^oo u a * (x^). 
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First, consider the case when there is a sequence 


oo such that > s' , 

(X n 


for all A: = 1,2,... .In 


this case, 0 G A a * ( x ^), and Corollary 4.4(h) implies that 0 € A~*(s 7 ), where the function u* is dehned 
in (4.5) for the sequence of discount factors {ck^ }fc=i, 2 ,... - Observe that u*(s') = u(s') and u*(x) > u(x) 
for all i£l. This implies At, (s') C At (s'). Thus 0 G At (s'). 

Second, consider the complimentary case, when there exists a number N such that x< s' a , for 
n > N. Let n > At In view of Statement 2(b) of Lemma 6.7, G n * (x- n> ) > G a * (s' a ,). Therefore, 


Un 


m n 


i (* (b) ) = v«*M n) ) - = K + - aW - 

> v <( s a*) + CSa* ~ CX {n) - m«* = U a * n (s' a , ) + c(s ' a . - X (n) ), 


> G n * (s ',) - cx™ - m n 


where the first and the last equalities follow from the definition of the functions u a , the second equality 
follows from (6.19) and from the optimality of the (s' a ,,S' a ,) policies for discount factors a*, the first 
inequality follows from Statement 2(c) of Lemma 6.7, and the last inequality follows from (6.19). Since 

s', —>■ s' and x^ —> s', 

UL n 

u(s') = lim = lim u a *(s' a ,). 

n—too n n—>-oo ri 71 

Moreover, since 0 G A a * n (s' a ,) for all n = 1, 2,..., Theorem 4.3(h) implies that 0 G A~(s 7 ). Thus, the 
(s', S') policy is average-cost optimal. 

Now let 79 = 0 almost surely. As explained in the paragraph preceding Proposition 6.3, the (0,0) policy 
(f) is average-cost optimal. Let us prove that 


lim s a = lim S a = 0. (6.28) 

arfl Q 1~l 

Let a G (0,1). Consider an arbitrary policy a. Since v a (x) > = vt(x), when x > 0, then v a (x ) = 

for all x > 0. This formula and (6.22) imply G a (x) = cx + h(x) /(I — a) for x > 0. Thus, the function 
G a (x) is increasing, when x G [0, oo). This implies S a < 0. Since s a < S a , then s* = liminf Q yi s a < 0. 
To complete the proof of (6.28), we need to show that s* = 0. Indeed, let us assume that s* < 0. Fix an 
arbitrary x G (s*,0). Then there exists a sequence a n | 0 such that s an —>■ s* as n oc and s an < x, 
n = 1,2,... . The (s n , n , S n , ri ) policy elf is optimal for the discount factor a n , and this policy does not 
order at the state x, n = 1,2,... . Therefore vt n (x) = h(x)/( 1 — a n ) —> +oo as n —> oo. However, 
Va n (x) = K — cx. This implies that the {s (in , S (in ) policy d/ 1 cannot be optimal for a discount factor 
a n > (K — cx)/(K — cx — h(x)). • 

For N = 1, 2,... , we shall write Gjsr,a instead of Gn,f,ci if F(^) = 0 for all x G M. 


Lemma 6.11 Suppose there exist z, y G R such that z < y and 


h(y) - h(z) 
y- z 


(6.29) 


Then G a (x) —> +oo and G^^ix) +oo as \x\ oo for all a G [0,1) and for all N > 0, and these 
functions are K-convex. 
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Proof. Observe that the assumption in Lemma 6.11 is equivalent to the existence of z, y € R such that 
z < y and 

E[lk(y-D)-M«-D)] K _ £ (fi 30) 

V - Z 

Indeed, since h is convex, h(y — d) — h(z — d) < h{y) — h{z), and (6.29) implies (6.30). Also, (6.30) 
implies that for some d > 0 inequality (6.29) holds for y := y — d and z := z — d. 

According to (6.21), G_\r n (x) -y oo as x —> oo for all N = 0,1,... . We show that the result continues 
to hold when x —> —oo. Suppose z < y satisfy (6.30). Inequality (6.30) can be rewritten as 

cy + E h(y — D) < cz + IE h(z — D). 

Thus, Go >a (z) > Go, a (y)- Since Go, a is convex, then Go, a (x) —>■ oo as x —> — oo. According to (6.21), 

G N ,a(x ) = G 0 ,a(x) + aEv N} a(x - D) > G 0 ,a(x), N = 1,2,... , 

G a {x) = G a (x) + aE v a (x — D) > G o, a (x), 

where Go,a(x) —> +oo as x —> —oo. ■ 

Theorem 6.12 Under the condition stated in Lemma 6.11, the following statements hold for each discount 
factor a € [0,1).' 

(i) For t = 0,1,... consider real numbers St, a satisfying (6.25) and st )Cl defined in (6.26) with g(x) = 
Gt, a (x), x £ R. Then for every N = 1,2,... the ( St, St ) policy with st = SN-t-i,a a nd St = 
S'jv-t-ijQ, t = 0,1,..., N — 1, is optimal for the N-horizon problem with the zero terminal values. 

(ii) Consider real numbers S a satisfying (6.25) and s a defined in (6.26 )forg(x) := G a (x), x € M. Then 
the ( s a , S a ) policy is optimal for the infinite-horizon problem with the discount factor a. Furthermore, 
a sequence of pairs {(sqa, St a )}t=o,i,... considered in statement (i) is bounded, and, if (s* , S'*) is a 
limit point of this sequence, then the (s *, S*) policy is optimal for the infinite-horizon problem with 
the discount factor a. 

Proof. Observe that Gq i(X {x) = cx + E h(x — D). This function is convex and, in view of Lemma 6.11, 
Go. a {x) —> oo as \x\ —> oo. The rest of the proof coincides with the proof of Theorem 6.10 with the 
functions G tjV o a replaced with the functions G Lo . ■ 

By using the results of this section, Feinberg and Liang [18, 19] obtained additional results for the 
inventory control problem. Feinberg and Liang [19] described the structure of optimal policies for all values 
of the discount factor a > 0 for hnite-horizon problems and for all values of a € [0,1) for infinite-horizon 
problems. In particular - , the smallest possible values of the discount factor a* mentioned in Theorem 6.10 
are computed in [19]. Though the general theory of MDPs implies that the value functions vt, a (x), Gi. n (x), 
v a (x), and G a (x) are lower semi-continuous in x, it is proved in [19] that these functions are continuous. 
In particular, these continuity properties imply that, for total discounted cost criteria with finite and infinite 
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horizons, the decisions to order up to the level S (St) are also optimal at the states s ( st)■ Feinberg and 
Liang [18] proved that for the inventory control problem the average-cost optimality inequality in (6.20) 
holds in the stronger form of the optimality equation, the convergences u a (-) u(-) and G a (-) —>• //(•) 

take place, as a t 1, and the functions u(x) and G(x) are A'-convcx and continuous. Therefore, average- 
cost optimal (s, S ) policies can be derived from the optimality equation, and the decision to place an order 
up to the level S at the state s is also optimal for the average-cost criterion. 

Remark 6.13 This remark comments on the assumptions a £ [0,1), A' > 0, and c > 0. All the results 
of this paper stated for the finite horizon hold with the same proofs for arbitrary a > 0; see Feinberg and 
Liang [19] for detail. If K = 0, then it is well-known that it is possible to set s = S and St = St for the 
corresponding optimal (s, S') policies, see e.g., Heyman and Sobel [22, Proposition 3-1], and such policies 
are called base stock or S-policies. Indeed, this follows from Lemma 6.8 and (6.18), (6.19) for discounted 
problems, and then from Theorem 6.10(iii) for problems with average costs per unit time. If c = 0, then 
Assumption W* holds. In particular, the function c(x,a ) = K l a >o + E h(x + a — D) is ¥^-inf-compact 
as a sum of a lower-semicontinuous function and a ¥^-inf-compact function; see Theorem 5.3(1). All the 
results formulated in the paper for a fixed discount factor a £ [0,1) remain correct for c = 0. Furthermore, 
inequality (6.29) holds and therefore the conclusions of Theorem 6.12 hold. However, the function c is not 
inf-compact when c = 0. For example, c(—a,a) = K + Eh(—D) -/* +oo as a —>• +oo. The proof of 
Assumption B in Proposition 6.3 is based on Theorem 4.2, which uses the assumption that the function c 
is inf-compact. So, for the long-term average-cost criterion, the results of this paper do not cover the case 
c = 0. 

Remark 6.14 For the inventory control problem, we have considered an MDP with X = R and A(x) = M + 
for each x £ X. However, if the demand takes only integer values, for many problems it is natural to consider 
X = Z and A(x) = Z + , where Z is the set of integers and Z + is the set of nonnegative integers. Therefore, 
if the demand is integer, we have two MDPs for the inventory control problems: an MDP with X = M and 
an MDP with X = Z. All of the results of this paper also hold for the second representation, when the state 
space is integer, with a minor modification that the action sets are integer as well. In fact the case X = Z is 
slightly easier because ever}’ function is continuous on it and therefore it is lower semi-continuous. 
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